Analytic generation of synthesis units by closed loop training for totally speaker driven text to speech system (TOS drive TTS)
نویسندگان
چکیده
This paper provides a new method for automatically generating speech synthesis units. The algorithm, called Closed-Loop Training (CLT), is based on evaluating and reducing the distortion in synthesized speech. It minimizes distortion caused by synthesis process such as prosodic modification in an analytic way. The distortion is measured by calculating the error between synthesized speech units and natural speech units in a large speech database (corpus). The CLT method effectively generates the synthesis units that are most resembling of natural speech after synthesis process. In this paper, CLT is applied to a waveform concatenation based synthesizer, whose basic unit is a diphone. By using CLT, the synthesizer generates clear and smooth synthetic speech even with a relatively small volume of synthesis units.
منابع مشابه
Automatic rule generation for linguistic features analysis using inductive learning technique: linguistic features analysis in TOS drive TTS system
The linguistic features analysis for input text plays an important role in achieving natural prosodic control in text-to-speech (TTS) systems. In a conventional scheme, experts refine suspicious if-then rules and change the tree structure manually to obtain correct analysis results when input texts that have been analyzed incorrectly. However, altering the tree structure drastically is difficul...
متن کاملMachine Speech Chain with One-shot Speaker Adaptation
In previous work, we developed a closed-loop speech chain model based on deep learning, in which the architecture enabled the automatic speech recognition (ASR) and text-to-speech synthesis (TTS) components to mutually improve their performance. This was accomplished by the two parts teaching each other using both labeled and unlabeled data. This approach could significantly improve model perfo...
متن کاملAutomatic corpus-based training of rules for prosodic generation in text-to-speech
In this paper, we discuss a methodology for automatic prosodic modeling in Text-to-Speech (TTS) systems. The proposed methodology can be seen as a data-driven strategy to train prosodic rules from the automatic analysis of a specific text and its related speech material. Therefore, our corpus-based training procedure is based on an automatic linguistic analysis of the text and on an acoustic an...
متن کاملHMM-Based Distributed Text-to-Speech Synthesis Incorporating Speaker-Adaptive Training
In this paper, a hidden Markov model (HMM) based distributed text-to-speech (TTS) system is proposed to synthesize the voices of various speakers in a client-server framework. The proposed system is based on speaker-adaptive training for constructing HMMs corresponding to a target speaker, and its computational complexity is balanced by distributing the processing modules of the TTS system at b...
متن کاملCorpus-based techniques in the AT&t nextgen synthesis system
The AT&T text-to-speech (TTS) synthesis system has been used as a framework for experimenting with a perceptuallyguided data-driven approach to speech synthesis, with primary focus on data-driven elements in the \back end". Statistical training techniques applied to a large corpus are used to make decisions about predicted speech events and selected speech inventory units. Our recent advances i...
متن کامل